-
Notifications
You must be signed in to change notification settings - Fork 16
feat: add metrics to orchestrator and relayer #674
Conversation
# Conflicts: # e2e/go.sum # go.mod
WalkthroughThe changes indicate the integration of telemetry and metrics collection functionality across various components of the blobstream service. The focus is on enabling and configuring OpenTelemetry Protocol (OTLP) metrics with an HTTP exporter, incorporating Prometheus for monitoring, and adjusting function calls and configurations to support these enhancements. The modifications suggest a concerted effort to improve observability within the service's infrastructure. Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChat with CodeRabbit Bot (
|
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #674 +/- ##
==========================================
- Coverage 24.56% 24.38% -0.18%
==========================================
Files 29 29
Lines 3212 3227 +15
==========================================
- Hits 789 787 -2
- Misses 2328 2343 +15
- Partials 95 97 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 2
Configuration used: CodeRabbit UI
Files ignored due to filter (9)
- e2e/celestia-app/genesis.json
- e2e/docker-compose.yml
- e2e/go.mod
- e2e/go.sum
- e2e/telemetry/grafana/datasources/config.yml
- e2e/telemetry/otel-collector/config.yml
- e2e/telemetry/prometheus/prometheus.yml
- go.mod
- go.sum
Files selected for processing (15)
- cmd/blobstream/base/config.go (2 hunks)
- cmd/blobstream/bootstrapper/cmd.go (1 hunks)
- cmd/blobstream/common/helpers.go (3 hunks)
- cmd/blobstream/orchestrator/cmd.go (3 hunks)
- cmd/blobstream/orchestrator/config.go (6 hunks)
- cmd/blobstream/relayer/cmd.go (3 hunks)
- cmd/blobstream/relayer/config.go (6 hunks)
- e2e/scripts/start_orchestrator_after_validator_created.sh (2 hunks)
- e2e/scripts/start_relayer.sh (1 hunks)
- orchestrator/orchestrator.go (5 hunks)
- p2p/host.go (2 hunks)
- p2p/host_test.go (2 hunks)
- relayer/relayer.go (6 hunks)
- telemetry/metrics.go (1 hunks)
- testing/blobstream.go (3 hunks)
Files skipped from review due to trivial changes (1)
- e2e/scripts/start_orchestrator_after_validator_created.sh
Additional comments: 56
p2p/host.go (1)
- 22-35: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [15-30]
The addition of the
prometheus.Registerer
parameter to theCreateHost
function is correctly used to conditionally enable Prometheus metrics. Ensure that all calls toCreateHost
are updated to pass this new parameter.Verification successful
The calls to
CreateHost
inp2p/host_test.go
andcmd/blobstream/bootstrapper/cmd.go
passnil
for theregisterer
parameter, which is consistent with the conditional logic inCreateHost
that handles anil
registerer
by disabling metrics. The call incmd/blobstream/common/helpers.go
passes aregisterer
variable, which suggests that it is intended to enable metrics.
- Calls to
CreateHost
withnil
as theregisterer
parameter are valid and intended to disable metrics.- The call in
cmd/blobstream/common/helpers.go
is correctly passing aregisterer
to enable metrics.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to CreateHost to ensure they pass the new registerer parameter. rg --type go 'p2p.CreateHost\('Length of output: 499
p2p/host_test.go (1)
- 48-48: The update to pass
nil
for theregisterer
parameter in test function calls is appropriate when Prometheus metrics are not being tested.testing/blobstream.go (1)
- 54-56: Initialization of telemetry meters and their inclusion in the
NewRelayer
function is correctly implemented. Ensure that error handling for telemetry initialization is consistent throughout the codebase.Verification successful
The search results indicate that telemetry meter initialization is consistent across the codebase, with error handling present in each instance where telemetry meters are initialized. This confirms that the implementation in
testing/blobstream.go
is in line with the rest of the codebase.
- Initialization of telemetry meters is consistently done with error handling in both
testing/blobstream.go
and thecmd.go
files for the relayer and orchestrator components.- The
require.NoError(t, err)
pattern in the test file is a common way to handle errors in test cases, ensuring that the test fails immediately if an error occurs during the initialization of telemetry meters.Based on the provided context and the results from the executed script, the review comment is consistent with the codebase.
Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for telemetry meter initialization to ensure consistent error handling. rg --type go 'telemetry.Init.*Meters\('Length of output: 367
cmd/blobstream/bootstrapper/cmd.go (1)
- 88-88: The update to pass
nil
for theregisterer
parameter in theCreateHost
call is correct. Consider if metrics should be enabled for the bootstrapper and if so, pass an appropriateregisterer
.cmd/blobstream/common/helpers.go (1)
- 71-77: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [74-83]
The addition of the
registerer
parameter to theCreateDHTAndWaitForPeers
function is correctly used to support Prometheus metrics. Ensure that all calls to this function are updated to pass the new parameter.Verification successful
The output from the
rg
command shows us three occurrences of theCreateDHTAndWaitForPeers
function call:
- In
cmd/blobstream/orchestrator/cmd.go
, the function is called but the snippet does not show whether theregisterer
parameter is passed.- In
cmd/blobstream/relayer/cmd.go
, the function is called with theregisterer
parameter, which is correct.- The declaration of the
CreateDHTAndWaitForPeers
function incmd/blobstream/common/helpers.go
shows the addition of theregisterer
parameter.To verify the correctness of the integration, we need to ensure that all calls to
CreateDHTAndWaitForPeers
have been updated to include the newregisterer
parameter. The output provided shows that therelayer/cmd.go
file has been updated correctly, but we need to check theorchestrator/cmd.go
file to confirm that theregisterer
parameter is also passed there.
The output from the
rg
command for theorchestrator/cmd.go
file does not show the complete function call forCreateDHTAndWaitForPeers
. The snippet ends after thes.P2PKeyStore
argument, and we cannot see if theregisterer
parameter is included in the call.To verify if the
registerer
parameter is passed correctly in theorchestrator/cmd.go
file, we need to see more lines of context around the function call.
The extended context provided from the
orchestrator/cmd.go
file confirms that theCreateDHTAndWaitForPeers
function is indeed called with theregisterer
parameter. This aligns with the changes made to the function's signature in thehelpers.go
file, where theregisterer
parameter was added to support Prometheus metrics.With this confirmation, we can conclude that the calls to
CreateDHTAndWaitForPeers
in both theorchestrator/cmd.go
andrelayer/cmd.go
files have been updated correctly to pass the newregisterer
parameter.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for calls to CreateDHTAndWaitForPeers to ensure they pass the new registerer parameter. rg --type go 'CreateDHTAndWaitForPeers\('Length of output: 382
Script:
#!/bin/bash # Check the call to CreateDHTAndWaitForPeers in orchestrator/cmd.go to ensure the registerer parameter is passed. rg --type go -C 3 'CreateDHTAndWaitForPeers\(' cmd/blobstream/orchestrator/cmd.goLength of output: 248
Script:
#!/bin/bash # Extend the context lines for the call to CreateDHTAndWaitForPeers in orchestrator/cmd.go to ensure the registerer parameter is passed. rg --type go -C 10 'CreateDHTAndWaitForPeers\(' cmd/blobstream/orchestrator/cmd.goLength of output: 479
telemetry/metrics.go (9)
1-25: The package declaration and imports look standard and appropriate for the functionality described. The constants defined are clear and self-explanatory.
27-33: The
Config
struct is well-defined with appropriate field tags for deserialization. It's important to ensure that the JSON and mapstructure tags match the expected configuration format.35-35: The global
meter
variable is initialized correctly. However, it's important to ensure that this global state does not introduce any side effects, especially in a concurrent environment.Verification successful
Based on the provided script output, the global
meter
variable is used to create counters and histograms in a manner that includes error handling and does not show any unsafe concurrent writes or manipulations. The usage pattern aligns with thread-safe practices as per the OpenTelemetry specification. No thread safety issues have been identified from the current context.* 37-42: The `OrchestratorMeters` struct is well-defined with appropriate metric types. It's important to ensure that the metric names are unique and descriptive enough to avoid any confusion when monitoring.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the global meter variable is safely accessed across the application. rg --type go "meter"Length of output: 2784
Script:
#!/bin/bash # Verify the thread safety of the methods where the `meter` is used to create counters and histograms. # We will look for the methods `Int64Counter`, `Float64Histogram` and ensure they are used in a thread-safe manner. rg --type go -A 5 "meter\.Int64Counter\(" rg --type go -A 5 "meter\.Float64Histogram\("Length of output: 2590
44-74: The
InitOrchestratorMeters
function initializes the metrics correctly. Error handling is present, and the function returns an error if any metric cannot be created, which is good practice.77-81: The
RelayerMeters
struct mirrors theOrchestratorMeters
struct in structure and purpose. It's important to ensure consistency in metric naming conventions across different components.83-106: The
InitRelayerMeters
function is similar toInitOrchestratorMeters
and follows the same best practices in error handling and metric initialization.109-152: The
Start
function sets up the OpenTelemetry metrics exporter with proper error handling and logging. It's important to ensure that theShutdown
function is called on application exit to cleanly release resources.154-184: The
PrometheusMetrics
function sets up an HTTP server for Prometheus metrics scraping. It's important to ensure that the server is securely configured and that theShutdown
function is called to prevent resource leaks.Verification successful
To verify the secure configuration of the HTTP server and the proper use of the shutdown function, we need to inspect the calls to
PrometheusMetrics
incmd/blobstream/relayer/cmd.go
andcmd/blobstream/orchestrator/cmd.go
to see how the shutdown function is handled.
To confirm that the shutdown functions are properly called on application exit, we need to find the usage of the
stopFuncs
slice and verify that it is iterated over to call each shutdown function.
The outputs from the
rg
command show that in bothcmd/blobstream/relayer/cmd.go
andcmd/blobstream/orchestrator/cmd.go
, thestopFuncs
slice is used to store shutdown functions, including the one returned bytelemetry.PrometheusMetrics
. A deferred anonymous function is set up to iterate overstopFuncs
and call each shutdown function, logging any errors that occur. This pattern ensures that resources are cleaned up when the application exits, either normally or due to an error.Based on the provided context, the following can be confirmed:
- The HTTP server for Prometheus metrics is set up with a shutdown function.
- The shutdown function is stored in a slice called
stopFuncs
.- On application exit, all functions in
stopFuncs
are called to perform cleanup.This addresses the reviewer's concern about ensuring that the server is securely configured and that the
Shutdown
function is called to prevent resource leaks.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the HTTP server for Prometheus metrics is securely configured and that the shutdown function is properly called on application exit. rg --type go "PrometheusMetrics"Length of output: 554
Script:
#!/bin/bash # Inspect the usage of PrometheusMetrics in relayer/cmd.go and orchestrator/cmd.go rg --type go -C 10 "telemetry.PrometheusMetrics" cmd/blobstream/relayer/cmd.go rg --type go -C 10 "telemetry.PrometheusMetrics" cmd/blobstream/orchestrator/cmd.goLength of output: 1431
Script:
#!/bin/bash # Inspect the usage of stopFuncs to ensure that the shutdown functions are called on application exit. rg --type go -C 10 "stopFuncs" cmd/blobstream/relayer/cmd.go rg --type go -C 10 "stopFuncs" cmd/blobstream/orchestrator/cmd.goLength of output: 5741
cmd/blobstream/orchestrator/cmd.go (5)
8-11: The added imports for telemetry and Prometheus are necessary for the new functionality. Ensure that these packages are used appropriately throughout the file.
117-120: The initialization of
orchestratorMeters
is done correctly. Ensure that error handling is consistent and that any initialization error is properly logged and handled.122-146: The conditional logic for metrics configuration is well-structured. It's important to ensure that the configuration is validated and that the metrics endpoint is properly secured, especially if TLS is disabled.
151-160: The
CreateDHTAndWaitForPeers
function call now includes aregisterer
parameter. Ensure that the function's signature is updated accordingly and that the parameter is used correctly within the function.Verification successful
The output from the script confirms that the
CreateDHTAndWaitForPeers
function inhelpers.go
has been updated to include aregisterer
parameter of typeprometheus.Registerer
. This matches the change observed in thecmd.go
file where theregisterer
is passed as an argument to the function call.
- The function signature in
helpers.go
includes theregisterer
parameter.- The
cmd.go
files for both the orchestrator and relayer pass theregisterer
argument to the function.Based on the provided information, the review comment has been addressed correctly in the codebase.
* 184-184: The `orchestratorMeters` is passed to the `New` function of the orchestrator. Ensure that the orchestrator uses these meters correctly and that the metrics are meaningful and accurate.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the CreateDHTAndWaitForPeers function signature is updated and that the registerer parameter is used correctly. rg --type go "CreateDHTAndWaitForPeers"Length of output: 532
Script:
#!/bin/bash # Look at the definition of CreateDHTAndWaitForPeers in helpers.go to verify the signature and usage of the registerer parameter. rg --type go -A 10 "func CreateDHTAndWaitForPeers" cmd/blobstream/common/helpers.goLength of output: 404
cmd/blobstream/relayer/cmd.go (5)
5-12: The added imports for telemetry and Prometheus are necessary for the new functionality. Ensure that these packages are used appropriately throughout the file.
173-176: The initialization of
relayerMeters
is done correctly. Ensure that error handling is consistent and that any initialization error is properly logged and handled.178-208: The conditional logic for metrics configuration is well-structured. It's important to ensure that the configuration is validated and that the metrics endpoint is properly secured, especially if TLS is disabled.
210-210: The
CreateDHTAndWaitForPeers
function call now includes aregisterer
parameter. Ensure that the function's signature is updated accordingly and that the parameter is used correctly within the function.Verification successful
The function signature for
CreateDHTAndWaitForPeers
incmd/blobstream/common/helpers.go
has been successfully updated to include theregisterer
parameter of typeprometheus.Registerer
. This confirms that the function's signature is updated accordingly as mentioned in the review comment.* 252-252: The `relayerMeters` is passed to the `NewRelayer` function. Ensure that the relayer uses these meters correctly and that the metrics are meaningful and accurate.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the CreateDHTAndWaitForPeers function signature is updated and that the registerer parameter is used correctly. rg --type go "CreateDHTAndWaitForPeers"Length of output: 532
Script:
#!/bin/bash # Check the updated signature of CreateDHTAndWaitForPeers in helpers.go and verify the usage of the registerer parameter. rg --type go -A 10 "func CreateDHTAndWaitForPeers\(" cmd/blobstream/common/helpers.goLength of output: 406
cmd/blobstream/orchestrator/config.go (6)
11-12: The import of the
telemetry
package is correctly added to support the new telemetry features.51-65: The configuration template has been correctly updated to include telemetry settings. Ensure that the default values align with the intended defaults for the application and that the documentation is updated to reflect these new configuration options.
84-87: New flags for telemetry configuration have been added to the
addOrchestratorFlags
function. Verify that the flag descriptions are clear and consistent with the rest of the application, and that they are documented in the user guide or help output.103-103: The
MetricsConfig
field of typetelemetry.Config
has been added to theStartConfig
struct. Ensure that the telemetry package'sConfig
struct is designed to be marshaled into and out of this configuration format without issues.113-117: The
DefaultStartConfig
function has been updated with default values for the telemetry configuration. Confirm that these defaults are sensible for most deployments and that they are secure (e.g., TLS disabled by default may need justification).199-229: The
parseOrchestratorFlags
function has been updated to parse the new telemetry flags. Verify that the error handling is consistent with the rest of the application and that the flags are correctly updating theStartConfig
struct.cmd/blobstream/relayer/config.go (6)
11-12: The import of the
telemetry
package is correctly added to support the new telemetry features.73-87: The configuration template has been correctly updated to include telemetry settings. Ensure that the default values align with the intended defaults for the application and that the documentation is updated to reflect these new configuration options.
113-116: New flags for telemetry configuration have been added to the
addRelayerStartFlags
function. Verify that the flag descriptions are clear and consistent with the rest of the application, and that they are documented in the user guide or help output.139-139: The
MetricsConfig
field of typetelemetry.Config
has been added to theStartConfig
struct. Ensure that the telemetry package'sConfig
struct is designed to be marshaled into and out of this configuration format without issues.153-157: The
DefaultStartConfig
function has been updated with default values for the telemetry configuration. Confirm that these defaults are sensible for most deployments and that they are secure (e.g., TLS disabled by default may need justification).307-337: The
parseRelayerStartFlags
function has been updated to parse the new telemetry flags. Verify that the error handling is consistent with the rest of the application and that the flags are correctly updating theStartConfig
struct.cmd/blobstream/base/config.go (9)
92-95: Added constants for metrics configuration flags. Ensure these constants are documented and used consistently throughout the codebase.
401-406: The
AddMetricsFlag
function correctly adds a boolean flag for enabling OTLP metrics. Ensure that the flag is being handled appropriately where it's used.409-415: The
GetMetricsFlag
function correctly retrieves the value of the metrics flag. Verify that the returned values are being checked and handled correctly in the calling code.418-423: The
AddMetricsEndpointFlag
function correctly adds a string flag for the metrics endpoint. Ensure that the default value and flag description are appropriate and that the flag is used consistently.426-432: The
GetMetricsEndpointFlag
function correctly retrieves the value of the metrics endpoint flag. Verify that the returned values are being checked and handled correctly in the calling code.435-440: The
AddMetricsTLSFlag
function correctly adds a boolean flag for enabling TLS for the metrics backend. Ensure that the flag is being handled appropriately where it's used.443-449: The
GetMetricsTLSFlag
function correctly retrieves the value of the metrics TLS flag. Verify that the returned values are being checked and handled correctly in the calling code.452-457: The
AddP2PMetricsEndpoint
function correctly adds a string flag for the LibP2P metrics endpoint. Ensure that the default value and flag description are appropriate and that the flag is used consistently.460-466: The
GetP2PMetricsEndpointFlag
function correctly retrieves the value of the P2P metrics endpoint flag. Verify that the returned values are being checked and handled correctly in the calling code.orchestrator/orchestrator.go (6)
50-50: Added
Meters *telemetry.OrchestratorMeters
field to theOrchestrator
struct. Ensure that the new field is properly initialized and used throughout the orchestrator's methods.59-65: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [62-73]
The
New
function now acceptsmeters *telemetry.OrchestratorMeters
and initializes theMeters
field. Ensure that all instances whereNew
is called are updated to pass the new parameter.
274-274: Telemetry metric
ReprocessedNonces
is incremented when a nonce is requeued. Verify that this metric is incremented in all appropriate places where a nonce is reprocessed.282-282: Telemetry metric
ProcessingTime
is recorded after a failed attempt to process a nonce. Ensure that the metric is recorded accurately and reflects the time taken for processing attempts.287-287: Telemetry metric
FailedNonces
is incremented when nonce processing fails after retries. Verify that this metric is incremented in all appropriate places where nonce processing fails.292-293: Telemetry metric
ProcessedNonces
is incremented after a nonce is processed. Ensure that this metric is incremented in all appropriate places where a nonce is successfully processed.relayer/relayer.go (5)
13-14: Added import for
telemetry
package.51-51: Added
Meters
field of type*telemetry.RelayerMeters
to theRelayer
struct. Ensure that all instances ofRelayer
are initialized with this new field to avoid nil pointer dereferences.Verification successful
The output from the shell script indicates that the
Relayer
struct is being initialized with theMeters
field atrelayer/relayer.go:78
. This confirms that the newMeters
field is being considered in the initialization of theRelayer
struct, and there are no instances found where theRelayer
struct is initialized without the newMeters
field.
- The
Relayer
struct is properly initialized with theMeters
field in the provided instance.* 62-68: > Note: This review was outside the patches, so it was mapped to the patch with the greatest overlap. Original lines [65-78]Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for instances where Relayer struct is initialized without the new Meters field. ast-grep --lang go --pattern $'Relayer{$$$}'Length of output: 788
Modified the
NewRelayer
function signature to includemeters *telemetry.RelayerMeters
parameter and correctly initialized theMeters
field in the returnedRelayer
instance.
126-127: Integration of telemetry metrics recording in the
Start
method by adding code to record processing time. Ensure that theStart
method is always called with a valid context that includes the necessary telemetry labels.148-148: Integration of telemetry metrics recording in the
Start
method by adding code to record failures. Ensure that the failure metric is incremented in all relevant error paths.Verification successful
The verification process has been completed successfully. The review comment regarding the integration of telemetry metrics recording in the
Start
method by adding code to record failures has been addressed. The failure metric is incremented in all relevant error paths within therelayer/relayer.go
file.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for error handling blocks where the failure metric should be incremented but might be missing. ast-grep --lang go --pattern $'if $_ { $$$ r.Meters.Failures.Add($$$) $$$ }'Length of output: 77
Script:
#!/bin/bash # Search for if statements that likely contain error handling logic but do not contain the metric increment call. ast-grep --lang go --pattern $'if $_ { $$$ }' | rg --invert-match 'r.Meters.Failures.Add'Length of output: 179794
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review Status
Actionable comments generated: 0
Configuration used: CodeRabbit UI
Files ignored due to filter (7)
- e2e/docker-compose.yml
- e2e/go.mod
- e2e/go.sum
- e2e/telemetry/prometheus/prometheus.yml
- go.mod
- go.sum
- go.work.sum
Files selected for processing (2)
- e2e/scripts/start_orchestrator_after_validator_created.sh (3 hunks)
- e2e/scripts/start_relayer.sh (2 hunks)
Files skipped from review as they are similar to previous changes (2)
- e2e/scripts/start_orchestrator_after_validator_created.sh
- e2e/scripts/start_relayer.sh
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Overview
Adds metrics to orchestrator and relayer
Checklist
Summary by CodeRabbit
New Features
Enhancements
Documentation
Refactor
Tests